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- Abstract - 

We investigate infinite games on finite graphs where the information flow is perturbed by non- 
deterministic signalling delays. It is known that such perturbations make synthesis problems 
virtually unsolvable, in the general case. On the classical model where signals are attached to 
states, tractable cases are rare and difficult to identify. 

Here, we propose a model where signals are detached from control states, and we identify 
a subclass on which equilibrium outcomes can be preserved, even if signals are delivered with 
a delay that is finitely bounded. To offset the perturbation, our solution procedure combines 
responses from a collection of virtual plays following an equilibrium strategy in the instant¬ 
signalling game to synthesise, in a Frankenstein manner, an equivalent equilibrium strategy for 
the delayed-signalling game. 
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1 Introduction 

Appropriate behaviour of an interactive system component often depends on events gener¬ 
ated by other components. The ideal situation, in which perfect information is available 
across components, occurs rarely in practice - typically a component only receives signals 
more or less correlated with the actual events. Apart of imperfect signals generated by the 
system components, there are multiple other sources of uncertainty, due to actions of the 
system environment or to unreliable behaviour of the infrastructure connecting the compo¬ 
nents: For instance, communication channels may delay or lose signals, or deliver them in a 
different order than they were emitted. Coordinating components with such imperfect infor¬ 
mation to guarantee optimal system runs is a significant, but computationally challenging 
problem, in particular when the interaction is of infinite duration. It appears worthwhile to 
study the different sources of uncertainty in separation rather than as a global phenomenon, 
to understand their computational impact on the synthesis of multi-component systems. 

In this paper, we consider interactive systems modelled by concurrent games among 
multiple players with imperfect information over finite state-transition systems, or labelled 
graphs. Each state is associated to a stage game in which the players choose simultaneously 
and independently a joint action, which triggers a transition to a successor state, and gen¬ 
erates a local payoff and possibly further private signals to each player. Plays correspond 
to infinite paths through the graph and yield to each player a global payoff according to a 
given aggregation function, such as mean payoff, or parity. As solutions to such games, we 
are interested in synthesising Nash equilibria in pure strategies, i.e., profiles of deterministic 
strategies that are self-enforcing when prescribed to all players by a central coordinator. 

The basic setting is standard for the automated verification and synthesis of reactive 
modules that maintain ongoing interaction with their environment seeking to satisfy a com¬ 
mon global specification. Generally, imperfect information about the play is modelled as 
uncertainty about the current state in the underlying transition system, whereas uncertainty 



2 


Games with Delays - A Frankenstein Approach 


about the actions of other players is not represented explicitly. This is because the main ques¬ 
tion concerns distributed winning strategies, i.e., Nash equilibria in the special case where 
all players receive maximal payoff, which here is common to all of them. If each player wins 
when all follow the prescribed strategy, unilateral deviations cannot be profitable and any 
reaction to them would be ineffective, hence there is no need to monitor actions of other 
players. Accordingly, distributed winning strategies can be defined on (potential) histories 
of visited states, independently of the history of played actions. Nevertheless, these games 
are computationally intractable in general, already with respect to the question of whether 
distributed winning strategies exist mm®. 

Moreover, if no equilibria that yield maximal payoffs exist for a given game, and we 
consider arbitrary Nash equilibria, it becomes crucial for a player to monitor the actions of 
other players. To illustrate, one elementary scheme for constructing equilibria in games of 
infinite duration relies on grim-trigger strategies: cooperate on the prescribed equilibrium 
path until one player deviates, and at that event, enter a coalition with the remaining 
players and switch to a joint punishment strategy against the deviator. Most procedures for 
constructing Nash equilibria in games for verification and synthesis are based on this scheme, 
which relies essentially on the ability of players to detect jointly the deviation PH QH Ell]. 
This works well under perfect, instant monitoring, where all players have common knowledge 
about the last action performed by every other player. However, the situation becomes more 
complicated when players receive only imperfect signals about the actions of other players, 
and worse, if the signals are not delivered instantly, but with uncertain delays that may be 
different for each player. 

To study the effect of imperfect, delayed monitoring on equilibria in concurrent games, 
we introduce a refined model in which observations about actions are separated from ob¬ 
servations about states, and we incorporate a representation for nondeterministic delays for 
observing action signals. To avoid the general undecidability results from the basic setting, 
we restrict to the case where the players have perfect information about the current state. 
Under the assumption that the delays are uniformly bounded, we show that equilibrium 
outcomes from the version of a game where signals are delivered instantly can be preserved 
in the variant where they are delayed. Towards this, we construct strategies for the delayed- 
monitoring game by combining responses for the instant-monitoring variant in such a way 
that any play with delayed signals corresponds to a shuffle of several plays with instant 
signals, which we call threads. Intuitively, delayed-monitoring strategies are constructed, in 
a Frankenstein manner, from a collection of instant-monitoring equilibrium strategies. Un¬ 
der an additional assumption that the payoff structure is insensitive to shuffling plays this 
procedure allows to transfer equilibrium values from the instant to the delayed-monitoring 
game. 

We point out that when we set out with finite-state equilibrium strategies for the instant¬ 
monitoring game, the procedure will also yield a profile of finite-state strategies for the 
delayed-monitoring game. Hence, the construction is effective, and can be readily applied to 
cases where synthesis procedures for finite-state equilibria in games with instant monitoring 
exist. 

Related literature 

One motivation for studying infinite games with delays comes from the work of Shrnaya m 
considering sequential games on finitely branching trees (or equivalently, on finite graphs) 
where the actions of players are monitored perfectly, but with arbitrary finite delays. In 
the setting of two-player zero-sum games with Borel winning conditions, he shows that 
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these delayed-monitoring games are determined in mixed strategies. Apart of revealing that 
infinite games on finite graphs are robust under monitoring delays, the paper is enlightening 
for its proof technique which relies on a reduction of the delayed-monitoring game to a 
game with a different structure that features instant monitoring but, in exchange, involves 
stochastic moves. 

Our analysis is inspired directly from a recent article of Fudenberg, Ishii, and Kominers |5] 
on infinitely repeated games with bounded-delay monitoring whith stochastically distributed 
observation lags. The authors prove a transfer result that is much stronger than ours, which 
also covers the relevant case of discounted payoffs (modulo a controlled adjustment of the 
discount factor). The key idea for constructing strategies in the delayed-response game is to 
modify strategies from the instant-response game by letting them respond with a delay equal 
to the maximal monitoring delay so that all players received their signals. This amounts 
to combining different threads of the instant-monitoring game, one for every time unit in 
the delay period. Thus, the proof again involves a reduction between games of different 
structure, with the difference that here one game is reduced to several instances of another 
one. 

Infinitely repeated games correspond to the particular case of concurrent games with only 
one state. This allows applying classical methods from strategic games which are no longer 
accessible in games with several states [T3] • Additionally, the state-transition structure of our 
setting induces a combinatorial effort to adapt the delayed-response strategies of Fudenberg, 
Ishii, and Kominers: As the play may reach a different state until the monitoring delay 
expires, the instant-monitoring threads must be scheduled more carefully to make sure 
that they combine to a valid play of the delayed-monitoring variant. In particular, the 
time for returning to a particular game state may be unbounded, which makes it hard 
to deliver guarantees under discounted payoff functions. As a weaker notion of patience, 
suited for games with state transitions, we consider payoff aggregation functions that are 
shift-invariant and submixing , as introduced by Gimbert and Kelmendi in their work on 
memoryless strategies in stochastic games P32- 

Our model generalises concurrent games of infinite duration over finite graphs. Equilibria 
in such models have been investigated for the perfect-information case, and it was shown 
that it is decidable with relatively low complexity whether equilibria exist, and if this is 
the case, finite-state equilibrium profiles can be synthesised for several relevant cases of 
interest Baum 

The basic method for constructing equilibria in the case of perfect information relies on 
grim-trigger strategies that react to deviations from the equilibrium path by turning to a 
zero-sum coalition strategy opposing the deviating player. Such an approach can hardly work 
under imperfect monitoring where deviating actions cannot be observed directly. Alternative 
approaches to constructing equilibria without relying on perfect monitoring comprise, on 
the one hand distributed winning strategies for games that allow all players of a coalition to 
attain the most efficient outcome nn in 0 , and at the other extreme, Doomsday equilibria, 
proposed by Chatterjee et al. in ,7], for games where any deviation leads to the most 
inefficient outcome, for all players. 

In this paper, we prove a transfer result that implies effective solvability of games with a 
particular kind of imperfect information, due to imperfect monitoring of actions, and delayed 
delivery of signals. Towards this, we first introduce a new model of concurrent games where 
observation signals associated to actions are detached from the state information, and in 
which the emission and delivery time of signals can be separated by a lag controlled by 
Nature. Then, we present the proof argument which relies on a reduction of a delayed- 
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monitoring game to a collection of instances of an instant-monitoring game. 

2 Games with delayed signals 

There are n players 1 and a distinguished agent called Nature. We refer to a list 

x = (x z )i<i< n that associates one element x 1 to every player j as a profile. For any such 
profile, we write x~ l to denote the list (a '°)\<j<n,j^i where the element of Player i is omitted. 
Given an element x l and a list x ~ l , we denote by ( x l ,x ~ l ) the full profile (x l )\<i< n . For 
clarity, we always use superscripts to specify to which player an element belongs. If not 
quantified explicitly, we refer to Player i to mean any arbitrary player. 

2.1 General model 

For every player i, we fix a set A 1 of actions , and a set Y l of signals ; these sets are finite. 
The action space A consists of all action profiles, and the signal space Y of all signal profiles. 

2.1.1 Transition structure. 

The transition structure of a game is described by a game graph G = ( V. , E ) over a finite 
set V of states with an edge relation E CV x AxY x V that represents transitions labelled 
by action and signal profiles. We assume that for each state v and every action profile a, 
there exists at least one transition (v, a, y, v') £ E. 

The game is played in stages over infinitely many periods starting from a designated 
initial state Vq £ V known to all players. In each period t > 1, starting in a state Vt—i, 
every player i chooses an action a\, and Nature chooses a transition (vt-i, at, yt, Vt) £ E , 
which determines a profile yt of emitted signals and a successor state Vt . Then, each player i 
observes a set of signals depending on the monitoring structure of the game, and the play 
proceeds to period t + 1 with Vt as the new state. 

Accordingly, a play is an infinite sequence vq, ai,yi,v±, a, 2 , 2 / 2 , V 2 • ■ ■ £ V ( AYV )“ such 
that (vt- i, at,yt,vt) £ E, for all t > 1. A history is a finite prefix vo , ai,yi,vi, ..., at,yt,Vt £ 
V(AYV)* of a play. We refer to the number of stages played up to period t as the length of 
the history. 

2.1.2 Monitoring structure. 

We assume that each player i always knows the current state v and the action a* she is 
playing. However, she is not informed about the actions or signals of the other players. 
Furthermore, she may observe the signal y\ emitted in a period t only in some later period 
or, possibly, never at all. 

The signals observed by Player i are described by an observation function 

p i : V(AYV) + -r 2 Y \ 

which assigns to every nontrivial history n = vq, a±,yi,vi, ..., at,yt,vt with t > la set of 
signals that were actually emitted along n for the player: 

W C {yi £ Y i | 1 < r < t}. 

For a global history 7r £ V(AYV)*, the observed history of Player i is the sequence 
P l (n) :=v 0 , a\,z{, v u ..., a\,z\,v t 
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with z l r = /3 l (v o, ai,yi,v±, ..., a r ,y r ,v r ), for all 1 < r < t. Analogously, we define the 
observed play of Player i. 

A strategy for player i is a mapping s z : V(A 1 Y t V)* —> A 1 that associates to every 
observation history tt £ V (A l Y l V)* an action s 1 (tt). The strategy space S is the set of all 
strategy profiles. We say that a history or a play 7r follows a strategy s l , if a l t+1 = s l (f3 l (nt)), 
for all histories 7r t of length t > 0 in 7r. Likewise, a history or play follows a profile s £ S, if 
it follows the strategy s l of each player i. The outcome out(s) of a strategy profile s is the 
set of all plays that follow it. Note that the outcome of a strategy profile generally consist 
of multiple plays, due to the different choices of Nature. 

Strategies may be partial functions. However, we require that for any history n that 
follows a strategy s z , the observation history /3 1 (tt) is also included in the domain of s*. 

With the above definition of a strategy, we implicitly assume that players have perfect 
recall, that is, they may record all the information acquired along a play. Nevertheless, in 
certain cases, we can restrict our attention to strategy functions computable by automata 
with finite memory. In this case, we speak of finite-state strategies. 

2.1.3 Payoff structure. 

Every transition taken in a play generates an integer payoff to each player i, described 
by a payoff function p 1 : E —> Z. These stage payoffs are combined by a payoff aggre¬ 
gation function u : —>• R. to determine the utility received by Player i in a play tt as 

u 1 (tt) := u(p l (v o, a\, yi,vi),p l (vi, 02 , 2 / 2 , V 2 ), ■ ■ ■ )■ Thus, the profile of utility , or global pay¬ 
off, functions u l : V{AYV) U —> R is represented by a profile of payoff functions p l and an 
aggregation function u , which is common to all players. 

We generally consider utilities that depend only on the observed play, that is, u 1 {tt) = 
u 1 ( 7 P), for any plays tt,tt' that are indistinguishable to Player i, that is, (3 1 (tt) = /3 % (i r'). To 
extend payoff functions from plays to strategy profiles, we set 

u z (s) := inf{u l (7r) | 7r £ out(s)}, for each strategy profile s £ S. 

Overall, a game Q = [G, /3, u) is described by a game graph with a profile of observation 
functions and one of payoff functions. We are interested in Nash equilibria, that is, strategy 
profiles s £ S such that u l (s) > u z (s~ l ,r l ), for every player i and every strategy r l £ S l . 
The payoff w = u l (s ) generated by an equilibrium s £ S is called an equilibrium payoff. An 
equilibrium payoff w is ergodic if it does not depend on the initial state of the game, that 
is, there exists a strategy profile s with u(s) = w, for every choice of an initial state. 

2.2 Instant and bounded-delay monitoring 

We focus on two particular monitoring structures, one where the players observe their com¬ 
ponent of the signal profile instantly, and one where each player i observes his private signal 
emitted in period t in some period t + d\, with a bounded delay d\ £ N chosen by Nature. 

Formally, a game with instant monitoring is one where the observation functions /?* 
return, for every history tt = vq, a\,y\,v\, ..., at, yt, Vt of length t > 1, the private signal 
emmited for Player i in the current stage, that is, /3 1 {tt) = {y\}, for all t > 1. As the value 
is always a singleton, we may leave out the enclosing set brackets and write /3 1 (tt) = y\. 

To model bounded delays, we consider signals with an additional component that repre¬ 
sents a timestamp. Concretely, we fix a set B l of basic signals and a finite set D l C N of 
possible delays, for each player i, and consider the product Y l := B l x D l as a new set of 
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signals. Then, a game with delayed monitoring is a game over the signal space Y with obser¬ 
vation functions /3* that return, for every history n = vq , ai, (&i, di), Vi, ..., at, {b t , dt), Vt 
of length t > 1 , the value 

(3\n) = {(&*, 4) G B l x D l | r > 1, r + 4 = *}■ 

In our model, the role of Nature is limited to choosing the delays for observing the emitted 
signals. Concretely, we postulate that the basic signals and the stage payoffs associated 
to transitions are determined by the current state and the action profile chosen by the 
players, that is, for every global state v and action profile a, there exists a unique profile 
b of basic signals and a unique state v' such that (v,a,(b,d),v r ) £ E, for some d £ D; 
moreover, for any other delay profile d! £ D 1 we require (v, a, (b, d '), v') £ E, and also that 
p l {v, a, (b, d), v') = p l (v, a, (b, d'), v’). Here again, D denotes the delay space composed of the 
sets D l . Notice that under this assumption, the plays in the outcome of a strategy profile s 
differ only by the value of the delays. In particular, all plays in out(s) yield the same payoff. 

To investigate the effect of observation delays, we will relate the delayed and instant¬ 
monitoring variants of a game. Given a game Q with delayed monitoring, the corresponding 
instant-monitoring game Q' is obtained by projecting every signal y l = ( b l , d l ) onto its first 
component b l and then taking the transition and payoff structure induced by this projection. 
As we assume that transitions and payoffs are independent of delays, the operation is well 
defined. 

Conversely, given a game Q with instant monitoring and a delay space D 1 the corre¬ 
sponding game Q’ with delayed monitoring is obtained by extending the set B l of ba¬ 
sic signals in Q to B l x D l , for each player i, and by lifting the transition and pay¬ 
off structure accordingly. Thus, the game Q’ has the same states as Q with transitions 
E' := {(v,a,(b,d),w) \ ( v,a,b,w ) £ E,d £ D}, whereas the payoff functions are given by 
p n (v, a , ( 6 , d),w) := p l (v, a, b , w), for all d £ D. 

As the monitoring structure of games with instant or delayed monitoring is fixed, it 
is sufficient to describe the game graph together with the profile of payoff functions, and 
to indicate the payoff aggregation function. It will be convenient to include the payoff 
associated to a transition as an additional edge label and thus represent the game simply 
as a pair Q = (G, u ) consisting of a finite labelled game graph and an aggregation function 
u : IB —>• R. 

2.3 Shift-invariant, submixing utilities 

Our result applies to a class of games where the payoff-aggregation functions are invariant 
under removal of prefix histories and shuffling of plays. Gimbert and Kelmendi m identify 
these properties as a guarantee for the existence of simple strategies in stochastic zero-sum 
games. 

A function / : IB —> R is shift-invariant , if its value does not change when adding an 
arbitrary finite prefix to the argument, that is, for every sequence a £ IB and each element 
a £ Z, we have f(aa) = f(a). 

An infinite sequence a £ IB is a shuffle of two sequences /?, 7 £ Z“, if N can be partitioned 
into two infinite sets / = ■.. } and J = {jo,ji, • • ■ } such that a-i k = fik and <x, fc = 7 &, 

for all k £ N. A function / : TB —> R is called submixing if, for every shuffle a of two 
sequences (3, 7 £ IB, we have 

min{/(/ 3 ),/( 7 )} < f{a) < max{/(/3),/( 7 )}. 
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In other words, the image of a shuffle product always lies between the images of its factors. 

The proof of our theorem relies on payoff aggregation functions u : —» K. that are 

shift-invariant and submixing. Many relevant game models used in economics, game theory, 
and computer science satisfy this restriction. Prominent examples are mean payoff or limsup 
payoff, which aggregate sequences of stage payoffs pi,P 2 , • • ■ by setting: 


mean-payoff(pi,p 2 , • • ■) 
limsup(pi,p 2 , • • •) 


1 ‘ 

limsup - YV, 
t> 1 t ' 


r =1 


lim sup pt ■ 

t> l 


and 


Finally, parity conditions which map non-negative integer payoffs pi,p 2 , ■ ■ ■ called prior¬ 
ities to parity(pi,p 2 i...) = 1 if the least priority that occurs infinitely often is even, and 0 
otherwise, also satisfy the conditions. 


2.4 The transfer theorem 

We are now ready to formulate our result stating that, under certain restrictions, equilib¬ 
rium profiles from games with instant monitoring can be transferred to games with delayed 
monitoring. 

► Theorem 1. Let Q be a game with instant monitoring and shift-invariant submixing 
payoffs, and let D be a finite delay space D. Then, for every ergodic equilibrium payoff w in 
Q, there exists an equilibrium of the D-delayed monitoring game Q' with the same payoff w. 

The proof relies on constructing a strategy for the delayed-monitoring game while main¬ 
taining a collection of virtual plays of the instant-monitoring game on which the given 
strategy is queried. The responses are then combined according to a specific schedule to 
ensure that the actual play arises as a shuffle of the virtual plays. 

3 Proof 

Consider a game Q = (G, u) with instant monitoring where the payoff aggregation function u 
is shift-invariant and submixing, and suppose that Q admits an equilibrium profile s. For an 
arbitrary finite delay space D , let Q' be the delayed-monitoring variant oi Q. In the following 
steps, we will construct a strategy profile s' for Q' , that is in equilibrium and yields the same 
payoff u(s') as s in Q'. 

3.1 Unravelling small cycles 

To minimise the combinatorial overhead for scheduling delayed responses, it is convenient to 
ensure that, whenever the play returns to a state v, the signals emitted at the previous visit 
at v have been received by all players. If every cycle in the given game graph G is at least 
as long as any possible delay, this is clearly satisfied. Otherwise, the graph can be expanded 
to avoid small cycles, e.g., by taking the product with a cyclic group of order equal to the 
maximal delay. 

Concretely, let m be the greatest delay among maxZT, for all players i. We define a new 
game graph G as the product of G with the additive group Z m of integers modulo m, over the 
state set {vj | v £ V,j G Z m } by allowing transitions ( Vj,a , b, v'j +1 ), for every (v, a, b, v') G E 
and all j G Z m , and by assigning stage payoffs p l (vj,a, b,v' j+ 1) := p i (v, a, b, v 1 ), for all 
transitions (v, a, b , v') G E. Obviously, every cycle in this game has length at least m. 
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Moreover, the games ( G,u ) and ( G,u ) are equivalent: Since the index component j € Z m 
is not observable to the players, the two games have the same sets of strategies, and profiles 
of corresponding strategies yield the same observable play outcome, and hence the same 
payoffs. 

In conclusion, we can assume without loss of generality that each cycle in the game 
graph G is longer than the maximal delay maxi?*, for all players i. 


3.2 The Frankenstein procedure 

We describe a strategy /* for Player i in the delayed monitoring game G' by a reactive 
procedure that receives observations of states and signals as input and produces actions as 
output. 

The procedure maintains a collection of virtual plays of the instant-monitoring game. 

More precisely, these are observation histories for Player i following the strategy s* in G, 
which we call threads. The observations collected in a thread n = Vq, a\ , (b\, d\),vi, ..., a*, (6*, d l r ),v r 
are drawn from the play of the main delayed-monitoring game G '. Due to delays, it may 
occur that the signal (6*, d l r ) emitted in the last period of a thread has not yet been received. 

In this case, the signal entry is replaced by a special symbol #, and we say that the thread 
is pending. As soon as the player receives the signal, the placeholder # is overwritten with 
the actual value, and the thread becomes active. Active threads n are used to query the 
strategy s*; the prescribed action a * = s*(7r) is played in the main delayed-monitoring game 
and it is also used to continue the thread of the virtual instant-monitoring game. 

To be continued, a thread must be active and its current state needs to match the actual 
state of the play in the delayed-monitoring game. Intuitively, threads advance more slowly 
than the actual play, so we need multiple threads to keep pace with it. Here, we use a 
collection of \V\ + 1 threads, indexed by an ordered set K = V U {e}. The main task of the 
procedure is to schedule the continuation of threads. To do so, it maintains a data structure 
(r, h) that consists of the threads r = {rk)k^K and a scheduling sequence h = /i[0],..., h[t\ 
of indices from AT, at every period t > 0 of the actual play. For each previous r < t, the 
entry h[r\ points to the thread according to which the action of period r + 1 in the actual 
play has been prescribed; the last entry h[t\ points to an active thread that is currently 
scheduled for prescribing the action to be played next. 

The version of Procedure Frankenstein* for Player i, given below, is parametrised by 
the game graph G with the designated initial state, the delay space D\ and the given 
equilibrium strategy s* in the instant-monitoring game. In the initialisation phase, the 
initial state v$ is stored in the initial thread T e to which the current scheduling entry /i[0] 
points. The remaining threads are initialised, each with a different position from V. Then, 
the procedure enters a non-terminating loop along the periods of the actual play. In every 
period t, it outputs the action prescribed by strategy s’ for the current thread scheduled by 
h[t\ (Line 5). Upon receiving the new state, this current thread is updated by recording the 
played action and the successor state; as the signal ennnitted in the instant-monitoring play 
is not available in the delayed-monitoring variant, it is temporarility replaced by #, which 
marks the current thread as pending (Line 7). Next, an active thread that matches the new 
state is scheduled (Line 9), and the received signals are recorded with the pending threads 
to which they belong (Line 11 - 14). As a consequence, these threads become active. 
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Procedure: Frankenstein 1 (G, vq,D\ s*) 
// initialisation 

1 r e ■.= v 0 ; h[ 0] = £ 

2 foreach v £ V do t v := v 


// play loop 
for t = 0 to lo do 

3 k := h[t\ 

4 assert (t*, is an active thread) 

5 play action a* := s* {tj~) II <4+i 

6 receive new state v II Vt+i 

7 update Tfc := t*, a l #v 


assert (there exists an index k' ^ k such that t^ ends at state v ) 
set h[t + 1] to the least such index k' 


10 

11 

12 

13 

14 


receive observation z' £ B z x D z // z\ +1 

foreach (//'. d') £ z l do 
k := h[t — d l ] 

assert (r^ = p#v', for some prefix p, state v') 
update Tk := p(b l ,d l )v' 

end 


end 


3.3 Correctness 

In the following, we argue that the procedure Frankenstein* never violates the assertions in 
Line 4, 8, and 13 while interacting with Nature in the delayed-monitoring game Q\ and thus 
implements a valid strategy for Player i. 

Specifically, we show that for every history 

7T = f 0 , a 1; (&!, d\), vi, a t ,(bt,dt),v t 

in the delayed-monitoring game that follows the prescriptions of the procedure up to period 
t > 0, (1) the scheduling function h[t) = k points to an active thread Tk that ends at state v t , 
and (2) for the state Vt+i reached by playing at+i := s l {rk) at 7r, there exists an active 
thread Tk' that ends at Vt+i- We proceed by induction over the period t. In the base case, 
both properties hold, due to the way in which the data structure is initialised: the (trivial) 
thread r e is active, and for any successor state v\ reached by ai := s*(r e ), there is a fresh 
thread t Vi that is active. For the induction step in period t + 1, property (1) follows from 
property (2) of period t. To verify that property (2) holds, we distingiush two cases. If v t +\ 
did not occur previously in 7r, the initial thread T Vt+1 still consists of the trivial history vt+ 1 , 
and it is thus active. Else, let r < t be the period in which u t +i occurred last. Then, for 
k' = h[r\, the thread r y ends at Vt+ 1 - Moreover, by our assumption that the cycles in G 
are longer than any possible delay, it follows that the signals emitted in period r < t — m 
have been received along tt and were recorded (Line 12-14). Hence, Tk/ is an active thread 
ending at v t +\ , as required. 

To see that the assertion of Line 13 is never violated, we note that every observation 
history (3 l (n) of the actual play tt in Q' up to period t corresponds to a finitary shuffle of 
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the threads r in the t -th iteration of the play loop, described by the scheduling function h : 
The observations (a*, ( b r ,d r y,v r ) associated to any period r < t appear at the end of t^m, 
if the signal ( b r ,d r Y was delivered until period t, and with the placeholder otherwise. 

In summary, it follows that the reactive procedure Frankenstein* never halts, and it 
returns an action for every observation /?*( 7 r) of a history 7 r that follows it. Thus, the 
procedure defines a strategy /* : V(A l Y l V)* —>• A 1 for Player i. 

3.4 Equilibrium condition 

Finally, we show that the interplay of the strategies /* described by the reactive procedure 
Frankenstein*, for each player i, constitutes an equilibrium profile for the delayed-monitoring 
game Q' yielding the same payoff as s in Q. 

According to our remark in the previous subsection, every transition taken in a play 7 r 
that follows the strategy /* in Q' is also observed in some thread history, which in turn 
follows s*. Along the non-terminating execution of the reactive Frankenstein* procedure, 
some threads must be scheduled infinitely often, and thus correspond to observations of 
plays in the perfect-monitoring game G. We argue that the observation by Player * of a play 
that follows the strategy /* corresponds to a shuffle of such infinite threads (after discarding 
finite prefixes). 

To make this more precise, let us fix a play n that follows /* in Q' , and consider the infinite 
scheduling sequence h[ 0], h[ 1],... generated by the procedure. Since there are finitely many 
thread indices, some must appear infinitely often in this sequence; we denote by L* C K the 
subset of these indices, and look at the least period £*, after which only threads in L* are 
scheduled. Then, the suffix of the observation f3 l (ir) from period £ l onwards can be written 
as a |A*|-partite shuffle of suffixes of the threads Tk for k £ L l . 

By our assumption that the payoff aggregation function u is shift-invariant and submix¬ 
ing, it follows that the payoff u 1 (tt) lies between min{i(*(rfc) | k £ L 1 } and max{n*(rfc) | k £ 
L}. Now, we apply this reasoning to all players to show that /* is an equilibrium profile 
with payoff u(s). 

To see that the profile f in the delayed-monitoring game Q' yields the same payoff 
as s in the instant-monitoring game Q, consider the unique play 7 r that follows /, and 
construct L*, for all players i, as above. Then, all threads of all players i follow s*, which 
by ergodicity implies, for each infinite thread Tk with k £ L l that vHjk) = u l (s). Hence 
min{u*(rfc) | k £ L 1 } = max{td(rfc) | k £ L}= u 1 (tt), for each player i, and therefore 

u(f ) = w 0)- 

To verify that / is indeed an equilibrium profile, consider a strategy < 7 * for the delayed- 
monitoring game and look at the unique play 7 r that follows () in Q'. Towards a 
contradiction, assume that u 1 (tt) > Since u z (tt) < max{n*(rfc) | k £ L*}, there must 

exist an infinite thread Tk with index k £ L l such that u l {rk) > u l (f) = u l (s). But Tk 
corresponds to the observation /3*(p) of a play p that follows s _ * in Q, and since s is in 
equilibrium strategy we obtain u l (s) > u’ {p) = rt*(rfc), a contradiction. 

This concludes the proof of our theorem. 

3.5 Finite-state strategies 

The transfer theorem makes no assumption on the complexity of equilibrium strategies in the 
instant-monitoring game at the outset; informally, we may think of these strategies as oracles 
that the Frankenstein procedure can query. Moreover, the procedure itself runs for infinite 
time along the periods the play, and the data structure it maintains grows unboundedly. 



Dietmar Berwanger and Marie van den Bogaard 


11 


However, if we set out with an equilibrium profile of finite-state strategies, it is straight¬ 
forward to rewrite the Frankenstein procedure as a finite-state automaton: instead of storing 
the the full histories of threads, it is sufficient to maintain the curren state reached by the 
the strategy automaton for the relevant player after reading this history, over a period that 
is sufficiently long to cover all possible delays. 

► Corollary 2. Let Q be a game with instant monitoring and shift-invariant submixing pay¬ 
offs, and let D be a finite delay space D. Then, for every ergodic payoff w in Q generated 
by a profile of finite-state strategies, there exists an equilibrium of the D-delayed monitoring 
game Q' with the same payoff w that is also generated by a profile of finite-state strategies. 
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